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Abstract 

Background: Tall fescue and meadow fescue are important as temperate pasture grasses, forming mutualistic 
associations with asexual Neotyphodium endophytes. The most frequently identified endophyte of Continental 
allohexaploid tall fescue is Neotyphodium coenophialum, while representatives of two other taxa (FaTG-2 and 
FaTG-3) have been described as colonising decaploid and Mediterranean hexaploid tall fescue, respectively. In 
addition, a recent study identified two other putatively novel endophyte taxa from Mediterranean hexaploid and 
decaploid tall fescue accessions, which were designated as uncharacterised Neotyphodium species (UNS) and 
FaTG-3-like respectively. In contrast, diploid meadow fescue mainly forms associations with the endophyte taxon 
Neotyphodium uncinatum, although a second endophyte taxon, termed N. siegelii, has also been described. 

Results: Multiple copies of the translation elongation factor 1-a {tefA) and p-tubulin {tub2) 'house-keeping' genes, as 
well as the endophyte-specific perA gene, were identified for each fescue-derived endophyte taxon from whole 
genome sequence data. The assembled gene sequences were used to reconstruct evolutionary relationships 
between the heteroploid fescue-derived endophytes and putative ancestral sub-genomes derived from known 
sexual Epichloe species. In addition to the nuclear genome-derived genes, the complete mitochondrial genome 
(mt genome) sequence was obtained for each of the sequenced endophyte, and phylogenetic relationships 
between the mt genome protein coding gene complements were also reconstructed. 

Conclusions: Complex and highly reticulated evolutionary relationships between Epichloe-Neotyphodium 
endophytes have been predicted on the basis of multiple nuclear genes and entire mitochondrial protein-coding 
gene complements, derived from independent assembly of whole genome sequence reads. The results are 
consistent with previous studies while also providing novel phylogenetic insights, particularly through inclusion of 
data from the endophyte lineage-specific gene, as well as affording evidence for the origin of cytoplasmic 
genomes. In particular, the results obtained from the present study imply the possible occurrence of at least two 
distinct E typhina progenitors for heteropoid taxa, as well the ancestral contribution of an endophyte species 
distinct from (although related to) contemporary E baconii to the extant hybrid species. Furthermore, the present 
study confirmed the distinct taxonomic status of the newly identified fescue endophyte taxa, FaTG-3-like and UNS, 
which are consequently proposed to be renamed FaTG4 and FaTGS, respectively. 
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Background 

Neotyphodium endophytes are asexual fungal species that 
form mutualistic interactions with a number of cool- 
season grasses, including ryegrasses {Lolium spp.) and fes- 
cues {Festuca spp.). Endophytes are disseminated through 
dispersal of plant seeds and obtain nutrition and protec- 
tion from the host plant, while conferring superior persist- 
ence characteristics on the grass, such as improvements of 
mineral uptake and drought tolerance [1,2]. Furthermore, 
symbiotic fungal endophytes provide protection from 
vertebrate and invertebrate herbivores to the host plant 
through production of bioprotective alkaloids. To date, 
four major classes of alkaloids have been identified from 
endophyte infection of host grasses [3]: peramine and 
lolines, which deter invertebrate predation [4-6], and 
indol-diterpenes and ergot alkaloids, which are toxic to 
grazing vertebrates such as ruminant livestock [7,8]. Tall 
fescue {Festuca arundinacea Schreb. syn. Lolium arundi- 
naceum [Schreb.] Darbysh.) and meadow fescue {Festuca 
pratensis Huds. [syn. Lolium pratense (Huds.) Darbysh.]) 
are two fescue taxa that are particularly important as tem- 
perate pasture grasses, and form associations with Neoty- 
phodium endophytes. Tall fescue exhibits multiple ploidy 
level variants from tetraploid to decaploid [9,10]. Further- 
more, within the hexaploid type, the commonly cultivated 
Continental and Mediterranean morphotypes have been 
deduced to arise from differing diploid progenitor ge- 
nomes [11]. The most frequently identified endophyte of 
Continental allohexaploid tall fescue is Neotyphodium 
coenophialum (Morgan-Jones et Gams) Glenn, Bacon et 
Hanlin [12], while representatives of two other taxa, Fes- 
tuca arundinacea taxonomic group 2 {FaTG-2) and Fes- 
tuca arundinacea taxonomic group 3 {FaTG-?>), have 
been described as colonising decaploid and Mediterranean 
hexaploid tall fescue, respectively [9,13]. In addition, a re- 
cent study based on simple sequence repeat (SSR) geno- 
typing identified two other putatively novel endophyte 
taxa from Mediterranean hexaploid and decaploid tall fes- 
cue accessions, which were designated as uncharacterised 
Neotyphodium species (UNS) and i73^TG-3-like [9] (later 
named as F(^TG-4:[14]) respectively. In contrast, diploid 
meadow fescue mainly forms associations with the endo- 
phyte taxon Neotyphodium uncinatum (Gains, Petrini and 
Schmidt) Glenn, Bacon, Price and Hanli, although a 
second endophyte taxon, termed N siegelii, has also been 
described [15]. 

All of these previously characterised and novel endo- 
phyte taxa of tall and meadow fescue exhibit heteroploid 
genome constitutions, based either on presence of mul- 
tiple copies of known gene sequences [14,16-18] or gen- 
eration of multiple PCR amplicons from specific SSR 
loci [9,19]. On this basis, such endophytes have been pro- 
posed to originate from sexual Epichloe species through a 
series of interspecific hybridisation events. Although the 



probable origins of several Neotyphodium species are be- 
lieved to be well understood, those of the novel UNS and 
F(3^TG-3-like taxa are yet to be determined. In addition, 
the degree of resolution of such phylogenomic analysis is 
a function of both the number and nature of the DNA se- 
quences used to perform such studies. 

Nuclear protein-coding gene sequences have been 
employed in previous studies of fungal endophytes to 
elucidate evolutionary relationships at the intraspecific 
and interspecific levels, through phylogenetic analysis of 
partial sequences representing orthologous intronic regions 
of the translation elongation factor 1-a {tefA), p-tubulin 
{tub2) and actin {act!) genes [14,18,20,21]. However, each 
of these genes encodes proteins that control essential func- 
tions in eukaryotic genomes, and are hence not exclusive 
to either fungal species or, indeed, Neotyphodium endo- 
phytes. Phylogenetic analysis of gene sequences that are 
specific to the Epichloe-Neotyphodium lineage could hence 
provide higher resolution of the phylogenetic affinities be- 
tween sexual and asexual endophyte taxa. The perA gene 
catalyses synthesis of the invertebrate deterrent alkaloid 
peramine [6], and hence provides an ideal candidate. In 
previous studies, the number of multiple perA gene copies 
present in heteroploid endophytes has been shown to be 
consistent with proposed hybrid origins, irrespective of 
peramine production levels [6]. In addition, as a presum- 
ably dispensable gene, perA gene exhibits a higher rate of 
molecular evolution than the essential tefA and tub2 genes 
[22], providing the capacity to resolve close taxonomic re- 
lationships. In a previous study, phylogenetic analysis based 
on sequenced PCR amplicons from the perA gene was 
performed for a selected set of fescue-derived endophytes 
[23]. However, inclusion of a larger number of additional 
perA genes, including those from putative progenitor Epi- 
chloe species, would be expected to improve the resolution 
of analysis and obtain a deeper understanding of Epichloe- 
Neotyphodium phylogenomics. 

In addition to nuclear genes, sequence variation within 
the mitochondrial (mt) genome may be used to fully 
interpret the interspecific hybridisation process by which 
heteroploid endophytes are believed to have arisen. Follow- 
ing two-way interspecific hybridisation of filamentous 
fungi, segregation of mt genomes is believed to result in 
pure unmixed derivatives, although the temporary pres- 
ence of heteroplasmons may permit intergenomic recom- 
bination events [24]. Mt DNA also offers an advantage in 
terms of copy number, which has been estimated to range 
from ten to several thousand per cell [25,26]. Conse- 
quently, depth-of-coverage related to mt DNA in a whole 
genome sequencing dataset will be considerably higher 
than for genomic regions, increasing confidence of analysis. 
Furthermore, comparisons of molecular evolution between 
the nuclear and mitochondrial genomes of fungi have 
revealed accelerated rates in the latter [27], potentially 
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increasing the capacity to discriminate between closely 
related taxa. 

The present study describes phylogenetic analysis of 
fescue-derived endophytes based on three nuclear protein- 
coding genes {tefA, tub2, per A) and the complete protein- 
coding gene complement of the mt genome. This study 
has provided confirmation of known relationships and 
additional novel insights, due to the higher resolution per- 
mitted by multiple gene analysis. The majority of previous 
studies of Epichloe-Neotyphodium species have been based 
on sequences from PGR amplicons of partial gene se- 
quences. In contrast, the present study describes, for the 
first time, identification and use of complete sequences 
from the relevant genes solely derived from whole genome 
sequence datasets. 

Methods 

Endophyte isolates and DNA extraction 

Phylogenetic analysis was performed on 16 endophyte 
isolates (Table 1) representing the known taxa N, coeno- 
phialum, N. uncinatum, FaTG'2 and FaTG'3, as well as 
the two putative distinct taxa previously designated as 
FaTG'3'\ike and UNS [9]. Genomic DNA was extracted 



Table 1 Endophyte isolates used for phylogenetic 
analysis 



Endophyte 
isolate 


Species or 
taxon 


Host species 


Origin 


Source* 


E34 


N. coenophiolum 


F. orundinoceo 




RBG 


NEA14 


N. coenophiolum 


F. orundinoceo 


France 


NZA 


NEA16 


N. coenophiolum 


F. orundinoceo 


France 


NZA 


NEA20 


N. coenophiolum 


F. orundinoceo 


France 


NZA 


NEA22 


N. coenophiolum 


F. orundinoceo 


Spain 


NZA 


NEA21 


FoJG-3* 


F. orundinoceo 


Morocco 


NZA 


NEA23 


FoJG-3 


F. orundinoceo 


Tunisia 


NZA 


NEA17 


FoJG-2^ 


F. orundinoceo 


Spain 


NZA 


NEA32 


FoJG-2 


F. orundinoceo 


Morocco 


NZA 


NEA19 


UNS 


F. orundinoceo 


Algeria 


NZA 


NEA18 


UNS 


F. orundinoceo 


Sardinia 


NZA 


E81 


N. uncinotum 


F. p rote n sis 




RBG 


NEA33 


FoJG-3-\\ke 


F. orundinoceo 


Morocco 


NZA 


9707 


E. boconii 


Agrostis tenuis 


Switzerland 


ETH Zurich 


9340 


E. typhino 


Poo protenis 


Switzerland 


ETH Zurich 


9636 


E. typhino 


Poo triviolis 


Switzerland 


ETH Zurich 


SE 


N. lolii 


Lolium 
perenne 


New 
Zealand 


DEPI 



*ETH Zurich: Eidgenossische Technische Hochschule Zurich (Swiss Federal 
Institute of Technology, Zurich, Switzerland); NZA: New Zealand Agriseeds Ltd, 
Christchurch, New Zealand; RBG: Royal Barenbrug Group, Nijmegen-Noord, 
Netherlands; DEPI: Department of Environment and Primary Industries, 
Victoria, Australia. 

^FaJG-2: Festuca arundinacea taxonomic group 2. 
*FoTG-3: Festuca arundinacea taxonomic group 3. 



from lyophilized mycelia by cetyltrimethylammonium 
bromide (CTAB) extraction [28], and the quality and 
quantity of the DNA was assessed by both agarose gel 
electrophoresis and specific absorbance measurements 
using the NanoDrop 2000 Spectrophotometer (Thermo 
Scientific, Waltham, Massachusetts, USA). 

Paired-end library preparation and sequencing 

Genomic DNA was fragmented in a Covaris instrument 
(Woburn, MA, USA) to an average size of 100-900 bp. 
For each endophyte DNA sample, paired-end libraries 
with inserts c. 400 bp in size were prepared using the 
standard protocol (TruSeq DNA Sample Prep V2 Low 
Throughput: Illumina Inc., San Diego, USA) with paired- 
end adaptors. Library quantification was performed using 
the KAPA library quantification kit (KAPA Biosystems, 
Boston, USA). Paired-end libraries were pooled according 
to the attached adaptors and sequence analysed using the 
HiSeq2000 platform (Illumina) following the standard 
manufacturer s protocol. 

Processing and assembly of sequence data 

All generated sequence reads were quality controlled by 
filtering and trimming of reads based on quality using a 
custom Python script, which calculates quality statistics, 
and stores trimmed reads in several fastq files. Data as- 
sembly was performed using the Linux-based de novo 
assembler Velvet ver.1.1.06 [29]. For Velvet assembly, dif- 
ferent hash lengths (K-mer sizes) ranging from 39 to 51 
were tested as appropriate for different sequence read sets, 
and the minimum contig length was always defined as 
200 bp. Values for estimated coverage and coverage cut- 
off were set to auto. 

Assembly of nuclear gene sequences 

Presence and copy number of tub2, tefK and perK genes 
(using Genbank accession mxvobQV?>:tub2: AY722412, tefA: 
FJ660614, per A: AB205145) in each endophyte genome 
were initially determined through nucleotide BLAST (Basic 
Local Alignment Search Tool) [30] analysis using contigs 
from the optimised Velvet assembly of total reads as the 
database. In order to assemble each gene copy, matching 
reads for each reference gene were identified from a data- 
base of all trimmed reads from each endophyte genome, 
through a similarity search using BLAST, defining the E 
value threshold as 0.1. From this BLAST output, all corre- 
sponding paired reads (both forward and reverse) were ex- 
tracted from the database, and the second reads were 
reverse-complemented. Each of the first and second reads 
was concatenated and used as BLASTN queries against a 
database consisting solely of each reference gene sequence 
for assembly. From this search, reads that matched in an 
anti-sense orientation were reverse-complemented in order 
to orientate the concatenated reads in 5'- to 3 '-orientation 
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against the reference gene sequence. Subsequently, reads 
were separated into two distinct sets (designated readl and 
read2) and the two groups were used individually as 
BLASTN queries against the gene sequence that was to be 
assembled. The aim of these individual BLAST searches 
was to estimate positions for each read against the refer- 
ence database sequence. After assigning positions, each 
read was padded to the appropriate position along the ref- 
erence gene sequence using a customised PERL script. The 
padded read pairs (when both reads were selected from the 
BLAST output) were then concatenated and saved in 
FASTA format. Using the graphical multiple sequence 
editor SeaView [31], the padded reads were manually as- 
sembled into the defining gene sequences. Functional se- 
quences were predicted for multiple perA gene copies 
through translation of each gene sequence using ExPASy 
online DNA sequence translation tool [32] . 

Phylogenetic analysis of nuclear genes 

Multiple alignments of the complete gene sequences for 
each selected gene were performed individually using 
ClustalW [33] with default parameters. To reconstruct 
tree topology, parsimony, maximum likelihood (ML) and 
neighbour-joining (NJ) methods were used as imple- 
mented in MEGA 5 [34] with default parameters and 
1,000 bootstrap replicates. Gene sequences available in 
Genbank from related endophyte species were used ap- 
propriately, and corresponding accession numbers are 
provided in each tree diagram. After identification of pro- 
posed origin from an individual genome for each gene 
copy based on the individual gene trees, the three nuclear 
genes were concatenated in the order tub2'tefA'perA and 
aligned using ClustalW with the default parameters. For 
the concatenated multiple sequence alignment, phylogen- 
etic topology was reconstructed using MEGA with default 
parameters and 1,000 bootstrap replicates. Phylogenetic 
networks were constructed for aligned concatenated gene 
sequences using the NeighborNet algorithm [35] on the 
Nei-Li pairwise distance matrix, and network diagrams 
were produced using the program SplitsTree4 [36]. 

Assembly of mitochondrial genomes 

Contigs of mitochondrial (mt) genome origin were ini- 
tially identified using nucleotide BLAST at an E value 
threshold of 0.001 through alignment of a database con- 
taining all contigs from the optimised Velvet assembly 
(as described above) against the mt genome of the N. 
lolii standard endophyte (SE) strain as a reference [37] 
(Genbank accession number KF906135). For each candi- 
date endophyte, a set of contigs with higher read depth 
coverage were shown to have a significant match to the 
reference mt genome sequence. A cut-off value for read 
depth coverage of each mt genome was identified based 
on the BLAST output, and a second Velvet assembly 



was performed with assignment of this value as the cover- 
age cut-off value in order to filter contigs derived from the 
mt genome. A range of k-mer values were tested, and a 
final assembly was accepted on the basis of features such 
as total number of assembled contigs, N50 value and cu- 
mulative contig length. For those mt genomes with few 
(2-5) contigs, ordering was performed with BLASTN (E 
value threshold of 0.001) based on the pre-existing SE mt 
genome sequence, and overlapping regions were manually 
linked. In order to confirm gaps observed in comparison to 
the SE mt genome, alignment was performed for trimmed 
reads using the Burrows-Wheeler Alignment (BWA) tool 
[38] with the maximum number of gap openings set to 
five. Mapped reads were viewed using Tablet 1.12.02.06 
[39], a graphical viewer for sequence alignment. Observed 
gaps were fiirther confirmed through grouping of observed 
gap positions within each endophyte species or taxon. 

Identification of protein- coding gene sequences was per- 
formed using each mt genome sequence as the query 
database against the individual mitochondrial protein gene 
sequences from the clavicipitacean entomopathogenic 
fungus Metarhizium anisopliae (Genbank accession num- 
ber NC008068). Identified protein-coding genes were 
concatenated according to the order observed in the M, 
anisopliae mt genome. 

Phylogenetic analysis of mt genome protein coding gene 
complement 

Multiple alignment of concatenated mitochondrial protein- 
coding gene complements from the 19 endophytes and 
counterparts in M, anisopliae was performed using the 
M-LAGAN program within the mVISTA on-line suite of 
computational tools [40], with default parameters. Align- 
ments were manually edited for mis-alignments that may 
have accumulated due to overlapping gene fi:agments. To 
reconstruct the tree topology, parsimony, ML and NJ 
methods were used as implemented in MEGA 5 with default 
parameters and 1,000 bootstrap replicates. Furthermore, to 
study the level of identity of each mt genome protein-coding 
complement relative to that of M. anisopliae^ aligned 
sequences were visualized through use of a VISTA plot [41]. 

Results 

Identification of individual nuclear gene copies 

Presence of the three nuclear genes {tub2, tefA and perA) 
was determined for 13 fescue-derived endophyte genomes 
and all other reference taxa {Epichloe spp. and N, lolii) 
that were used for this study. The observed number of 
copies for each gene within each taxon was then deter- 
mined (Table 2). For each gene, 3 copies were observed 
within individual N, coenophialum genomes, while 2 
copies were observed from all other taxa apart from N, 
uncinatum, which contained a single copy of the tuh2 
gene. Moreover, one of the assembled perA gene copies 
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Table 2 Number of gene copies identified from each endophyte taxon represented in the analysis 



Gene Endophyte taxon 



Observed number of gene 
copies for 


perA 


N. coenophiolum 
3 


FoJG-2 
2 


FaJG-3 
2 


FaJG-3 like 
2 


UNS 
2 


N. unci not urn 
2 




tub2 


3 


2 


2 


2 


2 


1 




tefA 


3 


2 


2 


2 


2 


2 



from FaTG'2, FaTG-S and UNS genomes revealed a 
common variant structure based on large- to moderate- 
sized deletions (coordinates 1251-1878 bp and 4590- 
4918 bp; Figure 1). 

Phylogenetic relationships based on nuclear genes 

Phylogenetic relationships were reconstructed based on 
individual gene sequences of perA, tub2, and tefA genes, 
as well as the concatenated sequences of all three genes, 
using parsimony ML and NJ methods (related alignments 
can be found at URL: http://purLorg/phylo/treebase/ 
phylows/study/TB2:S14923). All positions containing gaps 
and missing data were eliminated during each analysis. 
Corresponding gene-specific DNA sequences from other 
taxa that were previously deposited in GenBank were in- 
cluded. Similar tree topologies were observed for all three 
methods, and the most parsimonious phylograms were se- 
lected (Figures 2, 3 and 4 and Additional files 1, 2, 3, 4 
and 5). The GenBank accession numbers of the nuclear 
genes derived from fescue-derived endophytes are pro- 
vided in Additional file 6. Maximum parsimony analysis 
based on the individual gene sequences of the three genes 
resolved more than one most parsimonious tree, and so 
bootstrap consensus trees inferred from 1000 replicates 
are displayed for each gene. In each phylogram, individ- 
uals from the same endophyte taxa were clustered to- 
gether, and were separated from the Epichloe isolates that 
were used in the study. Apart from the placement of E, 
baconii and E, amarillans within the perA-^nd tefA-spe- 
cific trees, phylogenetic analysis of individual genes re- 
solved similar genomic relationships between endophyte 
taxa. All major clades that were defined by single gene 
phylogeny were also strongly supported by the phylogeny 
of the concatenated per A, tub2, and tefA genes, due to ob- 
servation of similar tree topology. Maximum parsimony 
analysis of concatenated genes yielded a single most parsi- 
monious tree with high level of bootstrap support for the 
majority of the individual branches. 



In all instances, the phylogenetic trees were predomin- 
antly separated into two major groups representing 
different Epichloe species. For example, in the tefA gene- 
specific phylogram (Figure 3), Group 1 contained the 
taxa E, festucae, E, baconii, E, amarillans and E, bromi- 
cola, while Group 2 contained E. sylvatica, E, typhina, 
and E, clarkii. Furthermore, individual gene copies from 
the fescue-derived endophytes were located closely adja- 
cent to the reference E. festucae-, E. bromicola- and E 
typhina-denved sequences, suggesting affinities to puta- 
tive sub-genome components of the heteroploid taxa. 
However, in all instances members of the fescue-derived 
endophyte {N. coenophialum, FaTG'2, FaTG'3, FaTG-S- 
like and UNS) gene copy 1 (FGCl) clade were not so 
closely related to sequences from any of the currently 
included Epichloe endophytes (Figures 2, 3 and 4 and 
Additional file 2). In the perA-hzs^d phylogeny, E, amaril- 
lans formed a sister group to FGCl, while addition of a 
partial gene sequence from E, amarillans to the ^e^-based 
phylogeny generated a separate clade also containing E. 
baconii Nevertheless, in the absence of E. amarillans, the 
tub2'2iXvd concatenated gene-based analysis placed FGCl 
as a sister group to E. baconii 

The genomes of the known heteroploid endophyte 
taxa N. coenophialum, N. uncinatum and FaTG'3 con- 
tributed gene copies to both major phylogram groups, 
consistent with hybrid origins either from one of the 
relevant Epichloe species, or a closely related taxon. In 
contrast, the multiple gene copies from FaTG'2 and 
UNS were located only within Group 1. High levels of 
similarity between sequences from different endophyte 
taxa suggested the presence of common sub-genomic 
components. For instance, N. coenophialum and N, unci- 
natum genotypes always contributed gene copies that 
were identical or very closely related to one another, and 
similar to those from E. typhina. Although N. coenophia- 
lum gene copies showed close affinities to those from 
other fescue endophyte taxa within Group 1, distinct 



8.5 kb 



627 bp 
1251 bp 1878 bp 



328bp 



per A gene of A/. /o//7(SE) 
perA gene copy 1 of FaTG-2, FaTG-3 and UNS 



4590 bp 4918 bp 

Figure 1 Schematic diagram of perA gene copy 1 of FaTG-2, FaTG-3 and UNS aligned with the perA gene sequence of A/. \o\'\'\. 
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FaTG-3-like NEA33 GC1 

NcNEA16GC1 
Nc E34 GG1 
Nc NEA20 GC1 
NcNEA14GC1 
Nc NEA22 GC1 
E. amarillans E57 
FaTG-2 NEA32 GC2 
N. lolii ST 

FaTG-2 NEA1 7 GC2 
E. festucae AB205145 FI1 
Nc NEA22 GC2 
Nc NEA20 GC2 
NCNEA14GC2 
94 |jNcNEA16GC2 
781 Nc E34 GC2 



i<- FGC1 



-n 

J 



E. baconii 9707 
E. festucae JN640287 E2368 
UNSNEA19GC2 « 
UNS NEA18GC2 
E. elymi JX402755 E56 
E. bromicola JX441995 E502 
Nu E81 GC1 

E. brachyelytri JN613323 E4804 
E. typhina JN640289 E5819 
76 I FaTG-3 NEA21 GC2 
FaTG-3 NEA23 GC2 
FaTG-3-like NEA33 GC2 
E. typhina JX402754 E8 
E. typhina 9636 
E. typhina 9340 
Nu E81 GC2 
Nc E34 GC3 
Nc NEA22 GC3 
Nc NEA20 GC3 
NCNEA16GC3 
NCNEA14GC3 



■0 

I 

13 I 



Group 1 



Group 2 





N. coenophialum 




FaTG-2 




FaTG-3 


i 


UNS 




N. uncinatum 




FaTG-3-like 



Figure 2 Bootstrap consensus tree generated through parsimony analysis of perA gene sequence among reference endophyte isolates 
and selected fescue-derived endophytes. Branches with bootstrap values of greater than 70% from 1000 bootstrap replication are marked next 
to each branch. Endophyte taxa are colour coded as indicated in the legend. Endophyte taxon abbreviations prior to isolate name are as follows: 
Nc = N. coenophialum, Nu = A/, uncinotum, UNS = uncharacterised Neotyph odium species. The perA gene sequence of E. amorillons was derived 
from whole genome shotgun sequence available in GenBank under accession number: AFRFOOOOOOOO. The GenBank accession numbers of the 
perA genes derived from fescue-derived endophytes are provided in Additional file 6. 



clusters were generated in all instances. A further point of 
interest was the close relationships between the FaTG'2' 
derived perA and tefA gene copies and those obtained 
from both E. festucae and its asexual anamorph, N, lolii. 

As an alternative way to explore the reticulated evolu- 
tionary relationships between sexual Epichloe progeni- 
tor species and heteroploid Neotyphodium endophytes, 
a phylogenetic network diagram was constructed based 
on the concatenated nuclear gene sequence (Figure 5). 
Distinct clusters for asexual endophyte-derived gene 
copies were formed around the reference E, festucae 
and E, typhina sequences, corresponding to respective 



clades in Group 1 and Group 2 of the phylograms, and 
further supporting the hypothesis of progenitor rela- 
tionships. Separation of Rbaconii from FGCl gene cop- 
ies was also consistent with the phylogram structure. 
The network analysis also served to further demonstrate 
the differentiation of FaTG'2 and UNS -derived gene 
copies within the E, festucae-contdiimng clade of the 
phylograms, despite similarity in both instances to E. fes- 
tucae. Similar results were obtained for FaTG-S- and 
/T^TG-S-like-derived gene copies corresponding to mem- 
bers of FGCl within the phylograms, and, to a lesser ex- 
tent. Group 2. 
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Figure 3 Bootstrap consensus tree generated through parsimony analysis of tefA gene sequence among reference endophyte isolates 
and selected fescue-derived endophytes. Diagram properties are as described for Figure 2. P indicates partial gene sequences obtained from 
the Genbank. Accessions numbers are provided adjacent to species name. The GenBank accession numbers of the tefA genes derived from 
fescue-derived endophytes are provided in Additional file 6. 
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Figure 4 Bootstrap consensus tree generated through parsimony analysis of tub2 gene sequence among reference endophyte isolates 
and selected fescue-derived endophytes. Diagram properties are as described for Figure 2. The GenBanl< accession numbers of the tub2 genes 
derived from fescue-derived endophytes are provided in Additional file 6. 



Putative functional gene copies for perA gene were pre- 
dicted based on sequence translation (to produce an intact 
biosynthetic enzyme) and assigned to putative progenitor 
origins based on phylogenetic affinities with Epichloe spe- 
cies that are known to contain such sequences (Table 3). 
Putative gene functionality was consistent between the 



predicted sub-genomic components of each taxon. For 
example, both E, festucae- and E, typhina-hke per A gene 
copies (from Group 1 and Group 2) were predicted to be 
functional for all N. coenophialum isolates used for this 
study. In contrast, the perA gene copy characteristic of 
FGCl was predicted to be non-functional for the N, 
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Figure 5 NeighborNet network of relationships between copies of tiie concatenated tub2, tefA and perA gene sequences from 
reference endopiiyte isolates and selected fescue endophytes. 



coenophialum, UNS, FdTG'2, FdYG'?>, and /73^TG-3-like 
gene copies. Furthermore, predicted gene functionality 
was also consistent with the results of preliminary alkaloid 
profile analysis. For example, both perA gene copies of 
UNS endophytes were predicted to be non-functional, and 
these endophytes have not been observed to produce per- 
amine in planta (P. Ekanayake, unpublished). 

Mitochondrial genome sequence structure 

General structural characteristics for the 19 mt genomes 
were determined (Table 4), revealing variation of overall 
size from 51,884 - 96,481 bp. All except E, typhina mt 
genome sizes varied slightly within a given taxon, larger 
differences being observed between taxa. All shared the 
same 13 protein-coding genes arranged in the same order, 
accounting for 15%-28% of the entire mt genome, and 
showing 90% cumulative sequence similarity to the out- 
group species, M, anisopliae. In contrast to conservation 
of the protein-coding components, higher levels of se- 
quence divergence were apparent within the intergenic 
regions, due to multiple insertion/deletion events when 



compared to the N, lolii SE mt genome that was used as a 
reference. A further complication in this analysis was 
the presence of nuclear genome-derived sequences that 
showed more distant affinities to the mt DNA, perhaps 
generated by inter-organelle transfer and integration. 

Table 3 Summary of gene translation studies for perA 
gene 



Endophyte 
taxon 


Predicted 
functional 
gene copies 


Results of gene translation 
relation to proposed 
progenitor origin 


N. coenophialum 


2(3) 


E.f(F), E.t(F), FGCl (NF) 


FoJG-2 


1 (2) 


E. f (F), FGCl (NF) 


FaJG-3 


1 (2) 


E. t (F), FGCl (NF) 


F6/TG-3-like 


1 (2) 


E. t (F), FGCl (NF) 


UNS 


0(2) 


E. f (NF), FGCl (NF) 


N. uncinatum 


1 (2) 


E. t (F), E.br (NF) 



Note: Total number of gene copies identified for each taxon is given in 
brackets next to the number of proposed functional gene copies. E.f = E 
festucae, E.t-E. typhina, FGCl = Fescue gene copy 1, E. br = E bromicola, 
F = predicted functional gene, NF = predicted non-functional gene. 
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Table 4 Comparison of mt genomes of analysed endophytes 


Sample 


Endopphyte 


Number 


Coverage 




Mt Genome 




name 


taxon 


of contigs 
assembled 




Total length (bp) 


% protein gene 
content 


% ID within protein 
gene content 


NC008068 


M. anisopliae 


N/A 


N/A 


24673 


59.11 


_ 


NEA22 


N. coenophialum 


4 


>800 


93,968 


15.52 


90.7 


NEA14 


N. coenophialum 


5 


>900 


94,628 


15.41 


90.5 


NEA16 


N. coenophialum 


1 


>1800 


95,628 


15.25 


90.9 


NEA20 


N. coenophialum 


5 


>1700 


94,924 


15.36 


90.7 


E34 


N. coenophialum 


4 


>2100 


95,719 


15.24 


90.9 


NEA18 


UNS 


4 


>1200 


82,857 


17.60 


91.0 


NEA19 


UNS 


1 


>2400 


83,619 


17.44 


90.7 


NEA23 


FaJG-3 


4 


>2800 


91,949 


15.86 


90.7 


NEA21 


FaJG-3 


1 


>700 


91,820 


15.88 


90.8 


598829 


FaJG-3 like 


1 


>2500 


79,875 


18.26 


90.8 


NEA17 


FaJG-2 


2 


>1200 


95,822 


15.22 


91.2 


598852 


FaJG-2 


4 


>3800 


96,481 


15.12 


91.2 


E81 


N. uncinatum 


3 


>970 


84,718 


17.22 


90.6 


9636 


E. typhina 


1 


>2600 


85,030 


17.15 


90.6 


9340 


E. typhina 


2 


>2500 


51,884 


28.11 


90.5 


9707 


E. baconii 


1 


>1400 


61,133 


23.86 


89.9 




E. festucae 


N/A 


N/A 


69696 


20.93 


91.1 


SE 


N. lolii 


N/A 


N/A 


88738 


16.44 


91.0 


NEAll 


LpJG-2 


N/A 


N/A 


88810 


16.42 


91.1 



Phylogenetic relationships based on mitochondrial 
genome comparisons 

Phylogenetic relationships were reconstructed based on 
the concatenated sequences of 13 mitochondrial protein- 
coding genes from the fescue endophytes, while the equiva- 
lent for M anisopliae was used as the out-group (Figure 6). 
Similar tree topology was observed for parsimony, ML and 
NJ methods (related alignments can be found at URL: 
http://purLorg/phylo/treebase/phylows/study/TB2:S14923). 
As expected, individual sequences from the same taxon 
clustered together. A number of putative progenitor rela- 
tionships, such as that between £. typhina and N uncina- 
tum, were more readily apparent from the phylogram. 
Close relationships were revealed between the N, lolii and 
LpTG'2 mt genomes and that of their putative sexual pro- 
genitor, E, festucae, and similar, but less close, relationships 
were apparent for N, coenophialum and FdIG'2, Common- 
alities of mitochondrial genome structure were evident be- 
tween the FaTG'3 and F^TG-3-like mt genomes, albeit 
with lower bootstrap support, but affinity to potential Epi- 
chloe genomes was not so obvious, although an £. festucae 
mt genome provides the most obvious candidate. This re- 
sult was inconsistent with the data from the nuclear gene 
analyses in the present study, that revealed closer relation- 
ships of gene copies from the FdTG'?> and FaTG-^ASkQ 



isolates to the putative FGCl progenitor, and to E. typhina. 
A clear differentiation of the UNS mt genome from that of 
the preceding groups, with higher levels of sequence simi- 
larity to E. baconii and E, typhina rather than £. festucae, 
was evident from this analysis. 

Discussion 

The results obtained in the present study have con- 
firmed the accuracy of previous assignment of endo- 
phyte accessions to distinct known taxonomic groups 
based on SSR polymorphism, along with the definition 
of several putative novel taxa [9]. The prior study was 
only capable of performing phenetic classification, but 
analysis of individual nuclear gene sequences has fur- 
ther permitted exploration of genome complexity within 
the heteroploid endophyte taxa, as well as interpretation 
of relationships with contemporary Epichloe species as 
representatives of putative progenitors. 

Following the assembly of lUumina HiSeq2000 short 
reads utilising the Velvet assembly algorithm it was ob- 
served that large number of contigs were generated for 
heteroploid fescue grass-derived endophyte genomes (see 
assembly statistics listed in Additional file 7). This obser- 
vation indicated that although Velvet is well-suited to 
assembly of haploid genomes, is not so appropriate for 
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Figure 6 Bootstrap consensus tree generated through maximum lilcelihood analysis of concatenated protein-coding gene sequences 
within the mitochondrial genomes of reference endophyte isolates and selected fescue endophytes. Diagram properties are as described 
for Figure 2. The tree was rooted by addition of the mt genome protein-coding gene complement of Metorhizi urn onisoplioe. 



heteroploid genomes. Furthermore, in those instances 
characterised by multiple gene copies, Velvet was incap- 
able of constructing the individual gene copies using short 
reads. However, the number of assembled contigs was 
sufficient to indicate the number of gene copies, and when 
evidence for multiple copies was obtained, individual 
genes were accurately assembled by a manual process. 

Copy number variation of selected nuclear genes 

The presence of 3 copies for each of the tefA, tub2 and 
perA genes in the N. coenophialum genome suggests a 
tri-parental hybrid origin, consistent with previous stud- 
ies [17,23]. Similarly, the observation of 2 copies for each 
gene in the genomes of other heteroploid endophyte 
taxa (F(^TG-2, F^TG-3, F^TG-3-like and UNS) is com- 
patible with a series of bi-parental hybrid origins. Al- 
though N, uncinatum has also previously been inferred 
to have arisen as a bi-parental hybrid, the presence of a 
sole tub2 gene copy suggests selective gene loss, as pre- 
viously proposed to account for the current heteroploid 
constitution of this and other taxa. These results, apart 
from concordance with earlier sequence-based studies 
[14,19], are also consistent with the complexity of SSR 
profiles from the same accessions [9], which typically 
contained up to 3 distinct amplicons from N, coenophia- 
lum genotypes, and up to 2 amplicons from the other 
taxa in this study. 



Phylogeny of previously described fescue-derived 
endophytes 

The present study has also permitted identification of 
those Epichloe species that are likely to be most closely 
related to the taxa that participated in hybrid origins. 
Previous phylogenetic studies based on two of the nu- 
clear genes used in this study {tub2 and tefA), as well as 
the actl actin gene, have provided evidence for progenitor 
identity [15]. Three tall fescue-derived endophyte taxa 
have previously been included in such studies, N, coeno- 
phialum was proposed to have originated from E, festu- 
cae-., E, baconii- and E, typhina-hke ancestors [18,19,42], 
while FaTG-2 and FaTG-3 were suggested to be derived 
from E. festucae- and E. baconii-hke, and E. baconii- and 
E. typhina-hke ancestors, respectively [17]. As summarised 
in Figure 7, the present study was consistent with these 
predictions in terms of affinities to contemporary E. festu- 
cae and E. typhina genotypes, but more distant relation- 
ships were observed for E, baconii. The group designated 
FGCl in this study, which cannot be unequivocally attrib- 
uted to a E, baconii-like progenitor, was also identified in a 
previous study and termed the 'Lo//wm-associated endo- 
phyte' (LAE) clade [23,43]. Furthermore, two distinct E. 
typhina lineages appear to have contributed to formation 
of the N. coenophialum/N. uncinatum and FaTG-3/FaTG- 
3-like heteroploid genomes, respectively (Figure 7), based 
on interpretation of the tree and network diagrams. 
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Phylogenetic reconstruction based on the perA gene se- 
quence revealed a closer relationship between E. amarillans 
and the FGCl than for the E, baconii genotype that was 
used in this study. However, the E, amarillans 'derived tefA 
gene sequence demonstrated a close genetic relationship to 
E. baconii, and E. amarillans formed a sub-clade with E. 
baconii, E, festucae, and N, lolii as well the FGCl clade, 
consistent with previous studies [43]. Observed anomalies 
between the gene-specific phylogenies in the present study 
may be due to different rates of molecular evolution be- 
tween endophyte-specific (perA) and housekeeping {tefA 
and tub2) genes. Further to this, addition of the entire tefA 
gene sequences of E, amarillans to the phylogenetic ana- 
lysis may provide a higher level of resolution. 

Phylogeny of mt genomes 

For N, coenophialum, FaTG'2 and FaTG'3, the mito- 
chondrial gene complement-based analysis revealed clos- 
est relationships to E. festucae, suggesting that this or a 
closely related sexual species donated the cytoplasmic 
genome to the known heteroploid taxa. This conclusion 
is again consistent with previous studies, apart from the 
status of FaTG'3, which does not show strong similarity 
to the mt genomes of either E. baconii or E. typhina. This 
anomaly may be due to the effects of recombination be- 
tween progenitor mitochondrial genomes following gener- 
ation of a heteroplasmon by parasexual processes [24]. 
Such mechanisms have been demonstrated to operate in 
sexual crosses between E. typhina endophytes, although in 



general, uniparental inheritance is observed [44]. Alterna- 
tively, accelerated evolutionary rates of mt DNA relative 
to nuclear DNA, which have been observed in animals, 
fungi and in certain protist species [27], may contribute to 
lower phylogenetic affinities. In support of this explan- 
ation, substantial size differences were observed between 
the mt genomes of the two E, typhina isolates used for this 
study, suggesting that extensive surveys of intraspecific di- 
versity may be required to identify suitable candidates for 
progenitor status. However, mt genomes within a given 
heteroploid taxon were relatively uniform in size, suggest- 
ing that limited opportunities for evolutionary divergence 
have arisen. 

In contrast, the N, uncinatum mt genome protein cod- 
ing gene complement is very closely related to that of 
one E, typhina lineage, consistent with an origin from 
this species during heteroploid formation. Furthermore, 
the mt genome of LpTG'2, which was inferred to have 
arisen as an N, lolii x E, typhina hybrid [45], demonstrates 
a high genetic similarity to the N, lolii mt genome, as con- 
firmed by previous studies [37,46]. This latter observation 
suggests a relatively recent origin in evolutionary time, 
and the absence of complicating factors such as recombin- 
ation between mt genomes. 

Phylogeny of novel fescue-derived endophytes 

A close relationship was apparent between the previ- 
ously described taxon FaTG'2 and the novel UNS endo- 
phyte group, both of which show affinities to E, festucae 
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and to the putative progenitor of the FGCl/LAE Une- 
ages. This result was consistent with the SSR-based phe- 
netic analysis, in which FaTG'2 and UNS accessions 
were located in sister groups within the same super- 
cluster of the phenogram [9]. Despite this close affinity, 
the present study was able to confirm that these two 
taxa are distinct, based on the formation of separate 
sub-clusters in both the FGCl and Group 1 E. festucae- 
containing clades of the phylograms, as well as in the 
network diagram. The mt genome phylogram reinforced 
this distinction, suggesting that the UNS mt genome may 
have been contributed by the ancestor of the FGCl/LAE 
lineages, while the FaTG'2 mt genome, as previously de- 
scribed, is most closely related to E, festucae. In combin- 
ation, the data suggests that these two heteroploid taxa 
may be derived from hybridisation events in reciprocal 
mode between two pairs of closely related haploid species, 
or that divergence from common origins has occurred 
within each lineage. 

Similarly, close relationships between the FaTG'3 and 
FaTG'3'\ike endophytes was revealed through the initial 
endophyte-specific SSR analysis [9]. In the present study, 
both FaTG'3 and FaTG'3'hke endophytes display phylo- 
genetic affinities to both E. typhina and the FGCl/LAE 
ancestor, with particularly strong similarity for the E, 
typhina'hke gene copies. Similar relationships have been 
obtained for FaTG'3'hke endophytes (later designated as 
FaTG'4!) in a previous study of tub2 phylogeny [14]. Fur- 
thermore, both groups also showed E. festucae'hke mt 
genome structure. However, the two groups were identi- 
fied in differing host grass taxa, FaTG'3 genotypes being 
detected in Mediterranean hexaploid tall fescue acces- 
sions, while FaTG'3'hke endophytes were obtained from 
decaploid tall fescue [9]. At the genomic level, commonly 
observed deletions of the FGCl/LAE per A gene copy in 
FaTG'3 were not present in the i^^TG-S-like endophytes. 
In general, the FGCl copies of each nuclear gene were 
distinct between FaTG'3 and FaTG'3'hke endophytes, 
suggesting origin of this genomic sub-component from re- 
lated but distinct taxa. 

Variation of nuclear gene structure 

Previous phylogenetic studies of tub2, tefA, actl and sev- 
eral alkaloid biosynthesis genes, including perA were per- 
formed by PCR amplification and subsequent sequencing 
of amplified PCR products [17,21,23]. More recently, a 
study of tub2 phylogeny made partial use of whole gen- 
ome sequence data [13]. In contrast, the present study 
was solely based on whole genome sequencing and subse- 
quent independent assembly of entire tub2,tefA, and per A 
genes, allowing comprehensive identification of insertion- 
deletion events. Two common deletions were identified in 
the FGCl-specific perA gene copies from FaTG'2, FaTG' 
3 and UNS endophytes. A previous study of perA gene 



phylogeny reported the presence of a 328 bp deletion 
within the coding region of the FaTG'2 genome [23], 
similar to observations of the FaTG'2, FaTG'3 and UNS 
genomes in this study, and an additional adjacent deletion 
of 627 bp within the same gene copy was here identified 
for all three taxa. As FaTG'2 endophytes have previously 
been demonstrated to effectively produce the alkaloid, 
perA gene function must be due not to the FGC-1 gene 
copy, but the alternate copy that is putatively derived from 
an E, festucae'hke progenitor. 

In addition to these structural changes, the perA se- 
quence of E. baconii endophyte 9707 exhibited an identi- 
cal deletion to that reported in the E, festucae endophyte 
E2368 [47], through loss of the reductase domain- 
encoding sequence at the 3 '-terminus. Although this 
deletion is common among endophytes [3] none of the se- 
quenced novel fescue endophytes were found to contain 
this deletion. 

Conclusion 

Complex and highly reticulated evolutionary relation- 
ships between Epichloe-Neotyphodium endophytes have 
been predicted on the basis of multiple nuclear genes 
and entire mitochondrial protein-coding gene sequences 
derived from independent assembly of whole genome se- 
quence reads. Furthermore, results from the present study 
have confirmed the distinct status of the novel fescue 
endophyte taxa FaTG'3'Vike and UNS [9]. The designation 
of the FciTG'3'VikQ taxon as FdTG'^, as proposed in a re- 
cent sequence-based phylogenomics analysis [14], is sup- 
ported by the data presented here. For consistency, it is 
therefore also proposed that UNS is henceforth designated 
as FdTG'S, Apart from fundamental implications for evo- 
lutionary processes, the present study has provided infor- 
mation and resources for detection, discrimination and 
potential modification of agronomically important endo- 
phyte taxa. 
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The data sets supporting the results of this article are in- 
cluded within the article and its additional files. Nuclear 
protein-coding sequences and the reference N. lolii mt 
DNA sequence have been deposited in GenBank. Se- 
quence alignments of mitochondrial protein-coding genes 
have been deposited in TreeBASE at URL: http://purl.org/ 
phylo/treebase/phylows/study/TB2:S14923. 

Additional files 



Additional file 1: Bootstrap consensus tree generated through 
parsimony analysis of tub2 gene sequence among extended set of 
reference endophyte isolates and selected fescue endophytes. 

Branches with bootstrap values of greater thar^ 70% from 1000 bootstrap 
replication are marked next to each branch. Endophyte taxa are colour 
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coded as indicated in the legend. Endophyte taxon abbreviation prior to 
isolate name are as follows: Nc = N. coenophialum, Nu = N. uncinatum, 
UNS = uncharacterised Neotyphodium species. 

Additional file 2: Phylogram obtained for parsimony analysis of 
concatenated gene sequences of tub2, tefA and perA among 
reference endophyte isolates and selected fescue endophytes. 

Branches with bootstrap values of greater than 70% from 1000 bootstrap 
replication are marked next to each branch. Endophyte taxon 
abbreviations prior to isolate name are as follows: Nc = N. coenophialum, 
Nu = N. uncinatum, UNS = uncharacterised Neotypliodium species. 

Additional file 3: Bootstrap consensus tree generated through 
maximum likelihood analysis of tub2 gene sequence among 
reference endophyte isolates and selected fescue endophytes. 

Branches with bootstrap values of greater than 70% from 1000 bootstrap 
replication are marked next to each branch. 

Additional file 4: Bootstrap consensus tree generated through 
maximum likelihood analysis of tefA gene sequence among 
reference endophyte isolates and selected fescue endophytes. 

Branches with bootstrap values of greater than 70% from 1000 bootstrap 
replication are marked next to each branch. 

Additional file 5: Bootstrap consensus tree generated through 
maximum likelihood analysis of perA gene sequence among 
reference endophyte isolates and selected fescue endophytes. 

Branches with bootstrap values of greater than 70% from 1000 bootstrap 
replication are marked next to each branch. 

Additional file 6: Genbank accession numbers of the sequences 
represented in the phylogenetic study. 

Additional file 7: Sequence output and de novo assembly statistics 
for sequenced fescue-derived endophytes. 
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